Greedy parameter optimization for diabatic quantum annealing

A shorter processing time is desirable for quantum computation to minimize the effects of noise. We propose a simple procedure to variationally determine a set of parameters in the transverse-field Ising model for quantum annealing (QA) appended with a field along the y-axis. The method consists of greedy optimization of the signs of coefficients of the y-field term based on the outputs of short annealing processes. We test the idea in the ferromagnetic system with all-to-all couplings and spin-glass problems, and find that the method outperforms the traditional form of QA and simulated annealing in terms of the success probability and the time to solution, in particular, in the case of shorter annealing times, achieving the goal of improved performance while avoiding noise. The non-stoquastic σy term can be eliminated by a rotation in the spin space, resulting in a non-trivial diabatic control of the coefficients in the stoquastic transverse-field Ising model, which may be feasible for experimental realization. This article is part of the theme issue ‘Quantum annealing and computation: challenges and perspectives’.


Formulation
Let us define the following time-dependent Hamiltonian: where H z , H x and H y i are the problem (Ising) Hamiltonian, the transverse field term (to be called the x-field), and the y-field term, respectively, The symbols σ x i , σ y i and σ z i denote the components of the Pauli matrix at site (qubit) i, and J ij is for the interaction between sites i and j. The problem size is N. The term H y i renders the system non-stoquastic [53] and was introduced in [45] to approximately realize counterdiabatic driving. See also [44,48,50,54]. The time-dependent coefficients A(t), B(t) and C i (t) should satisfy the initial (t = 0) and final (t = τ ) conditions, following functions for these coefficients for simplicity though other (similar) functions can be considered [45], and C i (t) = c i sin 2 π t τ (2.4) with a = 1 to set the overall energy scale. The annealing time τ can be chosen arbitrarily and we test the cases of τ = 1 and τ = 5 in the following. Our strategy is to optimize b and c i under a variational principle to optimize appropriate measures. We simplify the process by fixing the amplitude of c i to a value found optimal in the mean-field case (to be described below) and choosing only the sign of c i in a greedy way. This facilitates the process considerably and yet will turn out to lead to significant performance improvements.
Our variational optimization is carried out by minimization of the expectation value of the final energy E = ψ(t = τ )|H z |ψ(t = τ ) , where ψ(t) is the wave function at time t, as well as by maximization of the ground state probability, or the fidelity, P gs = | ψ(t = τ )|ψ gs | 2 , where |ψ gs is the true ground state of the Ising Hamiltonian. Since we do not know the true ground state in a generic problem, we refer to the latter measure as an oracle.
The additional y-field term may not be straightforward to be implemented experimentally. Nevertheless, by a simple rotation of the spin axes [45,50], with θ i (t) = arctan(C i (t)/B(t)), we can transform the Hamiltonian into a transverse-field Ising model with non-trivial time dependence of coefficients,

Numerical results
In this section, we present numerical results for the mean-field, ferromagnetic and random (spin glass) systems. In addition, we discuss an additional algorithm which shortcuts the process of iterative determination of the signs of coefficients.

(a) Mean-field theory
As a preliminary to the next section of a ferromagnetic system, we first analyse the properties of the mean-field theory with the Hamiltonian (see [44] for a related idea) where σ z is the magnetization of the system ψ(t)|σ z |ψ(t) . We set τ = 1 in this section. The state vector can be expressed as a qubit |ψ(t) = α|0 + β|1 , where |0 = (1, 0) T and |1 = (0, 1) T , and the Schrödinger equation reads d dt where σ z is |α| 2 − |β| 2 . As the ground state of the final Hamiltonian is doubly degenerate, we choose the spin-up state σ z > 0 as the desired ground state. Two measures of success as defined in the previous section, the fidelity P gs = ( σ z + 1)/2 and the energy E = − σ z 2 , can be expressed by the magnetization σ z in the mean-field theory, which is not the case generally.    Figure 1 shows the dependence of magnetization σ z (t = 1) on b and c. In this landscape of the parameter space, we find the optimal values to maximize σ z by the classical optimizer, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [55][56][57][58], as b mf opt = 0.539 and c mf opt = 1.565 with σ z = 1.000. Notice that no greedy search is applied here since the problem is very simple in the mean-field case. Although finding optimal values of the coefficients is sufficient for our purpose in the following analysis, we illustrate in appendix B the time developments of the wave function, the coefficients, and the magnetization in the mean-field context, comparing mean-field results with the exact solution of the counterdiabatic term, for better understanding of the physics of the mean-field theory.

(b) Ferromagnetic system (i) Parameter dependence of the energy and fidelity
We move a step forward and analyse the ferromagnetic system with all-to-all interactions, The mean-field theory of the previous section is expected to give the correct solution to this model in the thermodynamic limit [59] though finite-size effects may reveal differences.
As the interactions are homogeneous, we drop the i dependence of c i and set all of them to c. To break the Z 2 symmetry, we choose positive values of b and c and focus our attention on the region around the optimal value obtained in the mean-field case, b ∈ [0, 1] and c ∈ [1,2]. Dependence of the fidelity and energy on b and c is shown in figure 2a,b, respectively, for N = 20. We also conducted the same analysis for the case where annealing is terminated before the designated annealing time (τ actual < τ) or carried through beyond (τ actual > τ), and the results are displayed in figure 2c,d. These plots indicate that the optimization is robust in b and sensitive to c and the actual annealing time. Similarities of figure 2a,b and of figure 2c,d suggest that the two measures of success, the fidelity and the energy, give essentially the same result, as anticipated from the mean-field case.
System size dependence of the optimal parameter values for each N, b N opt and c N opt , is shown in figure 3a,b. They show convergence beyond N ≈ 12 towards the values at N = 20 (b N=20 opt = 0.539, c N=20 opt = 1.564). These values are in good agreement with the results from the mean-field theory, b mf opt = 0.539 and c mf opt = 1.565, as expected. The error in magnetization 1 − σ z decreases with the system size, and we do not see a visible difference between the two measures in figure 3c,d. The residual error decreases polynomially as a function of the system size N. We will use the size-dependent parameters b N opt and c N opt obtained from fidelity optimization for further analysis.   pseudo-Boolean optimization and the roof duality algorithm solutions [60][61][62][63] and a samplingbased algorithm [64]. More precisely, we choose the optimal values of c i one by one with other coefficients fixed at their constant values. This approach reduces the search space considerably and also makes the energy (or fidelity) landscape simple.
The first step is to pick up an arbitrary i (e.g. i = 1) and optimize c 1 by setting other c j (j = 1) to zero. As shown in the rows of figure 4a,b for N = 8, we choose optimal c 1 by minimizing 1 − P gs (in (a)) or E (in (b)). See also figure 14, where the scale of the vertical axis is changed step by step for a better resolution. In practice, when we use E as the function to be minimized, this function is symmetric with respect to the change of sign of c 1 due to the double degeneracy of states. We therefore arbitrarily set c 1 to a positive value c BFGS 1 by BFGS. When we use 1 − P gs for minimization, we choose the all-up state as the target ground state, so no problem of degeneracy exists. We next choose another i (e.g. i = 2) and optimize c 2 by fixing c 1 to the already-optimized value and other c j (j = 1, 2) to zero. The c 2 dependence of 1 − P gs and E is shown in the second row of figure 4. This process is repeated until all parameters are fixed. It is seen in figure 4 that 1 − P gs and E become smaller as the iteration proceeds.
We have found that the optimal values c BFGS i are always close to c N opt : The resulting optimal values vary from site to site, ranging from 1.507 to 1.566, while c N=8 opt is 1.563 according to the analysis of the previous section. This means that our sequential greedy optimization strategy finds the ground state with good precision for the ferromagnetic system with all-to-all couplings.

(c) Spin glass problem
We apply essentially the same procedure as developed above to the prototypical hard optimization problem of spin glass.
The algorithm proceeds as follows. The value of b is set to b N opt obtained for the ferromagnetic case, and the absolute value of c i is fixed to c N opt as demonstrated in the ferromagnetic case. We only choose the sign of each c i step by step, site to site, in a greedy way. It will be seen that this simple process leads to significant improvements in performance as compared to the traditional QA and classical SA. The sign of c i is chosen by the average gradient of the optimization measure (E or 1 − P gs ) near the origin 0 ≤ c i ≤ = 0.1 with the values of other c j 's (j = i) fixed. The reason is that the optimization measure has only a single minimum, whose position can be detected near the origin, as a function of a single c i , as will be illustrated later.
More precisely, we first choose an arbitrary i (e.g. i = 8) and fix all other c j to zero. The optimization measure E is symmetric with respect to the inversion c 8 → −c 8 , and we therefore look instead at the curvature of E in the asymmetric range 0 ≤ c 8 ≤ . This process is repeated for all i and we choose the site i with the largest absolute average curvature. If this turns out to be i = 8 with a positive derivative, we set c 8 = −c N opt , which breaks the symmetry. We next choose another i (e.g. i = 6) and see the sign of the average derivative of E near the origin with c 8 fixed to −c N opt and all other c i 's to zero. We choose another i and the same process is repeated for all i(= 1, 2, 3, 4, 5, 7) and we select the site with the largest value of the absolute average derivative (e.g. i = 6 with a negative derivative) and assign the sign as c 6 = +c N opt . Iterating this process, all c i 's are assigned fixed values. See algorithm 1.

Algorithm 1. Sequential QGO.
Input: QA measure f (b, c) Output: a solution of the cost function An example of this process is illustrated in figure 5 for the all-to-all coupling Sherrington-Kirkpatrick model of spin glass [65] with the distribution function of interaction, We observe that the difference is minimal, at most 0.4%, and the signs are correctly reproduced by the greedy algorithm.  For a more systematic analysis, we applied the algorithm to the spin glass with various sizes from N = 4 to 16 with 100 random instances for each N. The resulting success probability is plotted as a function of size in figure 6a for τ = 1 and 5. QGO and QA obey the Schrödinger dynamics, and the calculation was conducted by QuTiP [66]. Also plotted are the success probabilities by the traditional QA and SA. The latter classical algorithm has been tested by the following schedule of temperature decrease (inverse temperature increase) where β(0) = 0 and β(τ ) = 10 and by solving the classical master equation. QGO returns not a state vector in QA or a probability distribution in SA but a specific solution of spin configuration, which is the output from the algorithm 1. Thus the success probability of QGO refers to a portion of instances with success. Confidence intervals are calculated by bootstrapping from 10 000 resamples. It is clear that the present algorithm QGO far outperforms the other two. Another viewpoint is provided by the time-to-solution (TTS), a standard measure of computation time for heuristic algorithms [67]. It is defined as where p success is the empirical probability of success for the computation time τ , and P is the target success probability often set to 0.99. The result is plotted in figure 6b, which apparently shows a low computational cost of the present algorithm QGO drawn in blue. We should note, however, that QGO carries an overhead of repeated computation of the parameter dependence of E at each iteration. QGO calls QA as a subroutine N(N + 3)/2 times in total to determine the gradients. This number is derived as follows: in the n-th iteration (n = 1, . . . , N), we run QA at (c 1 , . . . , c N ) once and at (c 1 , . . . , c i + , . . . , c N

(d) Further simplifications of the algorithm
To see if we can further simplify the algorithm without much compromising the performance, we test two possibilities in this section.

(i) y-Field optimization
The first example is to drive the state only by H y , i.e. by a rotation around the y-axis, starting from the ground state of H x and to variationally optimize the parameters θ to minimize E as illustrated symbolically in figure 8a. Results for 100 spin glass instances are shown in figure 8b. It is seen that this y-field optimization performs slightly worse than QGO for τ = 1. It is also noted by comparison with figure 6a that the y-field optimization leads to better results than QA and SA with τ = 1. Thus the present method may be useful for some purposes due to its simplicity and its easiness of implementation possibly on a gate-based hardware.
(ii) Single-shot QGO Let us next study what happens if we fix the signs of coefficients of c i in a single shot without iteration. We first fix one of the parameters, e.g. c 1 = c N opt (spin-up), to break the Z 2 symmetry. Signs of other sites are next fixed according to the gradient of the optimization function, 1 − P gs or E, near the origin as shown in algorithm 2.

Algorithm 2. Single-shot QGO.
Input: QA measure f (b, c) Figure 9a shows the c i dependence of 1 − P gs for each spin to confirm our assumption that the optimal c i is close to ±c N opt . Blue-dashed and solid curves represent that the optimal values are negative and positive, respectively. Grey-dotted curves indicate re-evaluation of the fixedparameter, similarly to figure 5. The correct solution has been obtained by this single-shot QGO using 1 − P gs for this instance. When the energy measure E is employed for optimization as shown in figure 9b, two spins (the sixth and seventh) are fixed incorrectly. As illustrated in figure 9c, the ground state is retrieved if the two spins are re-evaluated after the other six spins are fixed. This suggests that we need iterations of the process for better results if we use E for optimization.
It is also noted that minimum values in the sixth spin in figure 9b are not around ±c N=8 opt = ±1.563 but are around 0.7 (red arrow). This implies that our assumption, optimal values are around ±c N opt is not valid in this case, which would lead to the difference in the performance between (a) and (b). Figure 10 shows the performances of the single-shot QGO for two measures of optimization for 100 spin glass instances. The single-shot QGO with the fidelity measure always succeeds in identifying the ground state, while the success probability quickly dumps under the energy measure. The fidelity measure is the overlap between the final state and the ground state and thus includes information of the ground state, leading to better results.

Summary and discussion
We have formulated a variational algorithm of QGO aiming to achieve improved performance of QA through a simple process of adjustment of signs of y-field coefficients. The result showed notable improvements in the success probability over the original method as well as over classical SA under comparable computational costs. Since we adjust the signs of coefficients sequentially, the energy landscape of each step is very simple with only a single minimum at almost the same absolute values of coefficients, the adjustment or optimization process does not encounter the problems of barren plateau in deep variational quantum circuits [68] or highly complicated landscapes which plague well-known variational algorithms such as VQE and QAOA [69]. Another advantage of the present method is the simplicity of the additional term in the Hamiltonian, the y-field, which can be rewritten in terms of the transverse-field Ising model without y-field by a rotation in the spin space. The latter can be implemented experimentally if the hardware can be designed to allow for the non-monotonic time dependence of coefficients of the x field and the Ising part of the Hamiltonian. It is in principle possible to apply the same idea to a final Hamiltonian with x-and y-components of the Pauli matrix, typically for problems of quantum state preparation in chemistry, although it is non-trivial if the present protocol would lead to satisfactory results in such cases. In the real settings of quantum processors, we cannot measure the fidelity because the target state is unknown. The fidelity analysis provides the algorithm's upper limit, revealing that the success probability can be 100% and suggesting that the goodness of the cost function defines the algorithm's performance. This finding is informative for the further development of algorithms.
Data accessibility. The code and data are provided in electronic supplementary material [70]. Authors' contributions. T.K.: conceptualization, investigation, writing-original draft; H.N.: conceptualization, writing-review and editing.
Both authors gave final approval for publication and agreed to be held accountable for the work performed therein.
Conflict of interest declaration. We declare that we have no competing interests. Funding. No funding has been received for this article.
Appendix A. Time dependence of the coefficients in the rotated frame An example of the behaviour of the coefficients in the rotated frame, equation (2.6), is given in this appendix. Let us write the Hamiltonian as and .
( A 3 ) Figure 11 shows the time dependence of A(t), B i (t) and C i (t) for a = 1, b = 0.5, c i = 1.5 and τ = 1. It is observed that B (t) and C (t) are non-monotonic, which suppresses system's transition to excited states in the final state.

13
royalsocietypublishing.org/journal/rsta Phil.  Appendix B. System behaviour of the mean-field theory To visualize the system behaviour under the mean-field theory discussed in §3a, we plot the trajectory of the system on the Bloch sphere in figure 12 for a number of parameter values with b = 0.5 (green) and c = 1.5 (long dashed) being closest to the optima. The system starts along the x-axis and is supposed to end ideally at the north pole |0 . The optimal parameter values are observed to approximately follow a straightforward path connecting these points. Figure 13 compares the exact counterdiabatic driving function for the mean-field theory [40] and our approximate result. The parametrized Ising Hamiltonian is